Rates of Mutations and Transcript Errors in the Foodborne Pathogen Salmonella enterica subsp. enterica

Abstract Because errors at the DNA level power pathogen evolution, a systematic understanding of the rate and molecular spectra of mutations could guide the avoidance and treatment of infectious diseases. We thus accumulated tens of thousands of spontaneous mutations in 768 repeatedly bottlenecked lineages of 18 strains from various geographical sites, temporal spread, and genetic backgrounds. Entailing over ∼1.36 million generations, the resultant data yield an average mutation rate of ∼0.0005 per genome per generation, with a significant within-species variation. This is one of the lowest bacterial mutation rates reported, giving direct support for a high genome stability in this pathogen resulting from high DNA-mismatch-repair efficiency and replication-machinery fidelity. Pathogenicity genes do not exhibit an accelerated mutation rate, and thus, elevated mutation rates may not be the major determinant for the diversification of toxin and secretion systems. Intriguingly, a low error rate at the transcript level is not observed, suggesting distinct fidelity of the replication and transcription machinery. This study urges more attention on the most basic evolutionary processes of even the best-known human pathogens and deepens the understanding of their genome evolution.


Introduction
Salmonella enterica subsp. enterica (subsequently referred to as S. enterica) is one of the major bacterial foodpoisoning pathogens, causing frequent gastroenteritis outbreaks and tens of thousands of deaths per year, and is closely monitored by disease control and prevention agencies of most developed countries. As a model microorganism, this species is widely used in various biological studies and for developing numerous bio-techniques, such as gene editing, Ames mutagenicity testing, phenotypic, and molecular detection, etc. Genomic and evolutionary research has also provided deep insights into the virulence, host adaptation, and drug resistance of S. enterica (Parkhill non-Typhimurium lineages (Didelot et al. 2011). Nonetheless, one pioneering mutation-accumulation study on the model strain-S. Typhimurium LT2, reported a spontaneous mutation rate of 0.003 per genome per generation (9), a quite high mutation rate among DNA-repair-functional bacteria studied with deep wholegenome sequencing (table 1, blue). However, because only one wild-type mutation-accumulation line with 4.9× depth of sequencing coverage was evaluated in this study, when most advanced bioinformatic tools had not been developed, the results were thus affected by false positives as pre-warned by the authors (Lind and Andersson 2008); a more thorough evaluation of the rate and molecular spectrum of mutation is needed to systematically evaluate the genome stability of S. enterica. Because within-species mutation rate/spectrum variation is not uncommon (Ness et al. 2015;Long, Behringer et al. 2016), such a study ought to involve multiple strains of different origins. A variance of de novo mutations could help to understand mutation-rate determinants, how mutations contribute to genome architecture evolution between closely related lineages, address the evolutionary potential of populations, and so on (Francioli et al. 2015;Ness et al. 2015;Lynch et al. 2016;López-Cortegano et al. 2021). Despite this, genome-wide studies on de novo mutation-rate variance within-species, especially for unicellular organisms including S. enterica, have been extremely rare and possibly barriered by the tremendous amount of resources and time required for the experiments and analyses (Ness et al. 2015;Wu et al. 2021). Notably, the evolutionary mechanisms of pathogenicity genes are still unclear, and whether such a variation that is associated with unusual mutational features is also unknown (Galán 2009;Galán et al. 2014;Journet et al. 2016;Gao et al. 2017).
Thus, to unbiasedly evaluate the spontaneous mutations of this important pathogen, we performed large-scale mutation-accumulation experiments combined with deep whole-genome sequencing (MA-WGS) on 18 S. enterica strains of eight serovars and from world-wide collection sites, including three DNA mismatch repair (MMR) knockout strains (table 2, supplementary dataset S1: table S1, Supplementary Material online). For each strain, a large number of parallel MA lines were initiated from a single ancestral cell and cultured on rich and nonstressing media. Each replicate subline was then repeatedly bottlenecked by single-individual transfers for thousands of generations, providing a setting in which genetic drift dominates selection and ample material for obtaining refined resolution of the mutational landscape (Bateman 1959;Mukai and Yamazaki 1964;Lynch et al. 2008;Halligan and Keightley 2009). Because spontaneous mutations are unbiasedly fixed in each MA line, the intrinsic fidelity of replication machinery and efficiency of the DNA mismatch repair system are revealed.
Compared with extensive studies on DNA errors, that is, genomic mutations, very few studies have focused on RNA errors at the level of transcripts because of technical hurdles to accurately detect in vivo transcript errors. With the recent development of the rolling-circle amplification-based sequencing (CirSeq) method (Lou et al. 2013;Traverse and Ochman 2016;Gout et al. 2017;Fritsch et al. 2021), cross-species comparisons of transcript errors became available. However, a variance of the transcript-error rate and molecular spectrum within one species remains unexplored, which requires many strains each with multiple replicates for highly accurate estimates. Therefore, in parallel with the MA-WGS approach to investigate spontaneous mutations, CirSeq libraries were constructed for the ancestral strains to accurately identify transcript errors and evaluate whether there is a withinspecies variation in the rate and molecular spectrum of transcript errors. Such explorations could also help to develop the biophysical models for cellular stability and test transcription-associated hypotheses.

Results
To quantitatively evaluate the Genomic Mutation rate and reveal within-species mutation-rate variation, we ran MA experiments on 18 S. enterica strains, with a total of 768 MA lines (48 MA lines for each of the 15 MMR-functional strains, performed in two batches; 16 for each of the three MMR-dysfunctional strains). Summing over all sublines, a total of 1,359,047 generations was involved (table 2, supplementary dataset S1: table S1, Supplementary Material online). All final MA lines and the corresponding ancestral lines of each strain were subjected to deep Illumina PE150 genome sequencing (supplementary dataset S1: tables S2-S19, Supplementary Material online), leading to a 116.87× (SE: 9.06) mean depth of sequencing coverage, and an average 90.02% (SE: 1.53%) of the genome covered with highquality reads, using the chromosome sequence of the type and model strain S. Typhimurium LT2 as the reference. Experimental details are listed in table 2, Materials and Methods, and supplementary dataset S1: tables S2-S19, Supplementary Material online. Since lethal or heavily deleterious mutations could be lost even under the extreme bottleneck effect of MA transfers and if mean fitness costs of mutations are high, mutation accumulation could be biased by purifying selection. To test this, for the MA dataset of each strain, we compared the nonsynonymous:synonymous mutations ratio with the random expectation-nonsynonymous:synonymous mutation sites ratio of the genome, and did not detect any sign of mutations being biased by purifying selection in any strain (supplementary dataset S1: table S20, Supplementary Material online); the mean effective population size (N e ) was 14, confirming again that genetic drift dominated selection during the experiments (table 2).
To evaluate transcript errors, we applied a modified CirSeq library construction protocol and Illumina SE300 sequencing on 72 RNA samples (four replicates for each of the 18 strains), generating an average of 5.69 × 10 7 (SE: 0.58 × 10 7 ) bases per sample. After applying quality- Pan et al. · https://doi.org/10.1093/molbev/msac081 MBE control filters and preliminary analyses, 680 total MA lines and 48 CirSeq samples for transcript-error analyses were used for final analyses (table 2, supplementary dataset S1: tables S2-S19, S21, Supplementary Material online), with library-failed and/or low depth-of-coverage lines excluded (supplementary dataset S1: table S22, Supplementary Material online). The details of all base-pair substitutions (BPS), small indels, and transcript errors are shown in figure 1 and supplementary dataset S1: tables S23-S25, Supplementary Material online, and their genome-wide distribution is outlined in figure 2.

Low Genomic Mutation Rate
Genomic mutation rates of the 15 world-wide MMR-functional strains are shown in figure 1 and supplementary dataset S1: tables S2-S16, Supplementary Material online. Analysis of variance (ANOVA) of the among-line variance of the genomic mutation rate reveals significant differences (ANOVA: F = 2.82, P = 0.0004). A further Tukey test shows that the mutation rate of Typhimurium LT2 is significantly higher than those of another three Typhi strains (Typhi CDC9228-77: P = 0.0003; Typhi PNG31: P = 0.0034; Typhi CDC382-82: P = 0.0087; fig. 1). Thus, the mutation rate varies within S. enterica. The accumulated mutations across all 15 strains yield a mean BPS mutation rate of 1.14 × 10 −10 per nucleotide site per generation (SE: 0.08 × 10 −10 ), or 5.54 × 10 −4 (SE: 0.40 × 10 −4 ) BPS mutations per genome per generation, one of the lowest mutation rates among studied bacteria (table 1 and fig. 3A). Because S. Typhi specifically infects human hosts, causing typhi fever, we wondered whether Typhi strains show differences in mutational features relative to those in other serovars. Eight of the 15 MMR-functional strains that we studied are of the Typhi serovar, either model or natural ones (table 2 and fig. 3B, supplementary dataset S1: tables S8-S15, Supplementary Material online). There is no significant difference in the overall mutation rates of Typhi strains (ANOVA, F = 2.07, P = 0.05; fig. 1).
Because the Type III secretion system of S. enterica plays a major role in virulence, enabling the injection of effector proteins into the cytoplasm of host cells, we specifically queried whether the genes of the Salmonella pathogenicity islands (SPI-genomic "islands" containing the virulence genes) (Blanc-Potard et al. 1999;Hensel 2000;Lostroh and Lee 2001) spontaneously mutate at different rates from other genomic regions. To test this, genes hit with BPSs in MMR-dysfunctional S. Typhimurium LT2 strains (ΔmutS, ΔmutL, ΔmutH) were pooled to increase statistical power (supplementary dataset S1: table S26, Supplementary Material online). We found no difference    (table 2), the gray blocks in the green circle represent regions without reads covered in most non-LT2 wild-type strains; the genomic distribution of mutations in S. Typhimurium LT2 ΔmutL, ΔmutS, ΔmutH, all the MMR-functional strains (the four groups of colored tiles from outside to inside)-red, gray, blue, green, purple, and black tiles mark different types of base-substitution mutations-A:T → C:G, A:T → G:C, A:T → T:A, G:C → A:T, G:C → C: G and G:C → T:A, respectively, note that five single tiles were jittered from the tiles below due to overlapping or being too close with other mutations. ORI, origin of replication; TER, replication terminus.  MBE of purifying selection biasing the coding-region mutation rate. To exclude the possibility that this is caused by differences in nucleotide composition, we normalize the mutation rates by taking the GC contents of different regions into consideration ; the BPS mutation rates are then 9.81 × 10 −11 (CI: 8.96 × 10 −11 -1.07 × 10 −10 ) and 2.25 × 10 −10 (CI: 1.90 × 10 −10 -2.65 × 10 −10 ) in coding and intergenic regions, respectively. Intergenic regions are indeed more mutable than coding regions. This might be associated with the possible transcription-associated DNA repair systems and needs further exploration.
The overall mutation rate for small indels is estimated to be 0.15 × 10 −10 (SE: 0.02 × 10 −10 ), about 13% of the BPS rate, with a bias to deletions (insertion bias μ insertion rate /μ deletion rate : 0.80) (table 1, supplementary dataset S1: table S31, Supplementary Material online). Knocking out the MMR decreases the indel-rate/BPS-rate ratio to 6.38%, inferring that the MMR preferentially repairs BPS than the indels. There is no significant indel-rate difference among strains (ANOVA, F = 1.24, P = 0.24); 36.90% of indels occur in simple sequence repeats (SSR) regions for all the MMR-functional strains, whereas 95.25% for MMR-deficient strains do so, that is, the indels are prone to be repaired in repeat regions (supplementary dataset S1:

Influence of DNA Mismatch Repair on Mutations
Comparing the mutational features of MMR + and MMR − strains helps to reveal the specificity of DNA repair and mutagenesis bias of premutations (mutations prior to the DNA repair) . To study this, MA experiments were performed with the MMR + S. Typhimurium LT2 strain and another three MMRdysfunctional strains (ΔmutS, ΔmutL, ΔmutH) derived from LT2, where 80, 2349, 2817, 4287 BPSs were detected, respectively (table 2). Similar to all studied bacteria, the mutation rates of MMR-deficient strains follow a wave-like distribution, with the highest mutation rates around 3 and 9 o'clock of the genome and lowest at the origin of replication, nucleotides with flanking G or C contexts do have elevated mutation rate than those without, for example, the A flanked with G and C could be 47× higher in mutation rate than the G flanked by A and T ( fig. 2, supplementary dataset S1: table S32, and figs. S1, S2, Supplementary Material online) (Lee et al. 2012;Foster et al. 2013;Sung et al. 2015).
As in most other studied bacteria, the MMR of S. Typhimurium LT2 preferentially repairs transitions, especially A:T → G:C ( fig. 3D): MMR repair-efficiencies for transitions and transversions (calculated by (ΔmutS mutation rate − wild-type mutation rate)/ΔmutS mutation rate) are 99.65% and 81.18%, respectively, and the repair efficiency for the A:T → G:C transitions is the highest-99.88% (supplementary dataset S1: table S33, Supplementary Material online). As a consequence, knocking out the MMR elevates the transition to transversion ratio by 34× ( fig. 3D, supplementary dataset S1: table S33, Supplementary Material online). The overall BPS mutation rate under the MMR deficiency is about 174× higher than under the MMR functionality, that is, 0.58% (SE: 0.06%) of the premutations escape the MMR. This is a relatively high MMR repair efficiency among bacteria, for example, 0.94% (SE: 0.11%) in E. coli, and knocking out the MMR elevates its mutation rate by 104× ). 0.52% (SE: 0.06%) BPS escaping the MMR in coding regions is slightly lower than 1.08% (SE: 0.28%) in intergenic regions, demonstrating again the low susceptibility of coding regions to base-substitution mutations at the DNA repair level.
For indels, the MMR deficiency elevates the mutation rate by a factor of 148×, implying the repair of 99.32% (SE: 0.28%) of all preindel mutations by the MMR. Similar to the findings in most other studied bacteria, the MMR also repairs insertions at slightly higher efficiency than deletions (0.33% escape MMR, SE: 0.23% vs. 1.47%, SE: 0.74%) ). The indels falling in SSRs are repaired at an efficiency of 99.76% (SE: 0.17%), higher than those in the non-SSR regions, 90.43% (SE: 5.11%). The indels falling in coding regions are repaired at similar efficiency to those in the intergenic regions (98.33%, SE: 0.84% vs. 98.95%, SE: 0.75%), a pattern different from that of the BPSs.

A Low Error Rate at the Transcript Level is Not Observed
From the successfully sequenced 48 CirSeq samples of 16 strains, we detected a total of 20,270 transcript errors (table 2, supplementary dataset S1: table S25, Supplementary Material online). Transcript errors of the 15 MMR-functional strains yield a mean transcript-error rate estimate of 7.38 × 10 −6 per site per transcription, without a significant within-species variation (SE: 0.21 × 10 −6 ; ANOVA: F = 0.93, P = 0.54; fig. 3E). The gene-specific transcript-error rates do not correlate with the expression level, consistent with the mutation rate versus the expression level (supplementary fig. S3, Supplementary Material online). We do not detect any correlation between mutations and transcript-error rates across all surveyed loci (r = −0.01, P = 0.68), nor any sites as hotspots for both mutations and transcription errors (supplementary dataset S1: table S34, Supplementary Material online). Contrary to its low genomic mutation rate ( fig. 3A), the transcript-error rates are not among the lowest compared with those from previously studied bacteria (Li and Lynch 2020), with G → A and C → U transitions being the main contributors ( fig. 3F).
Transcript errors of the 15 MMR-functional strains show a mean A/T bias of 5.18 (SE: 0.34), much higher than that of mutations 2.32, and mainly resulting from Spontaneous Mutations and Transcript Errors of S. enterica · https://doi.org/10.1093/molbev/msac081 MBE the extremely abundant G → A and C → U transcript errors ( fig. 3B and E). Except for G/C → A/T(U) transitions being the most dominant mutation/transcript-error types, the transcript-error spectrum is highly different from that of mutations ( fig. 3B, E, and F), clearly confirmed by the contrasting transition/transversion ratios: 6.29 (SE: 0.24) versus 1.91 (SE: 0.42) for the transcript errors and mutations, respectively. In terms of functional context, the coding regions and structural RNAs do not show any significant difference in the transcript-error rates, whereas that of other transcribed regions such as UTRs is slightly lower (supplementary dataset S1: table S30, Supplementary Material online).

Discussion
This massive study directly reveals the low mutation rate of the important foodborne pathogen S. enterica and sheds light on its long-term evolution; however, the revealed error rate at the transcript level is not accordingly low. Based on the genomic mutation patterns of 15 MMR-functional strains of diverse serovars, our rate estimate, which is only 1/6 of that previously reported for a single S. Typhimurium LT2 strain in a much smaller study (table 1) (Lind and Andersson 2008), provides direct evidence for the relatively high genome stability of S. enterica among bacteria ( fig. 3A and table 1). By using the false-positive rate 53% reported by Lind and Andersson (2008), we calibrated the LT2 mutation rate to be 3.71 × 10 −10 per site per cell division or 0.0016 per genome per cell division, about twice higher than the one reported in this study: 1.77 × 10 −10 and 0.00086. Based on the calibrated mutation rate, we further calculated the Poisson confidence intervals -(1.48 × 10 −10 , 7.59 × 10 −10 ), which do overlap with the ones reported in this study-(1.40 × 10 −10 , 2.20 × 10 −10 ).
A comparison of mutation patterns between MMR-functional and -dysfunctional S. Typhimurium LT2 strains also shows that the high MMR-efficiency is one major cause of the low mutation rate. Our results are also consistent with previous indirect studies reporting very few pangenome changes between ancient and modern S. enterica lineages in human teeth, high-fidelity DNA polymerases upon stress response, etc. (Koch et al. 1996;Zhou et al. 2018). We note that as in all MA studies using short-read sequencing, reliable structural variants were still not resolved in this study, and future trials using long-read sequencing are required to provide more insight into this issue.
To understand why S. enterica has a low spontaneous mutation rate, we first confirmed that the results are not technical artifacts caused by false negatives (see Materials and Methods, testing for false negative mutations), and then, evaluated the effective population size (N e ) of this species, as a high N e can be conducive to the evolution of low error rates according to the drift-barrier hypothesis ( fig. 3A) (Dettman et al. 2016;Lynch et al. 2016;Long, Sung, et al. 2018;Pan et al. 2021). For haploid organisms, in mutation-drift equilibrium at neutrally evolving sites, N e = π s /2μ, where π s is the pair-wise genetic distance at 4-fold degenerate (silent) sites, and μ is the BPS mutation rate per site per generation. Since 4-fold degenerate sites are actually not absolutely neutral, which is indeed the case as shown in previous studies (Shields et al. 1988;Ochman 2003;Chamary et al. 2006;Long, Sung, et al. 2018), the π s would then be downwardly biased by selection and our N e calculated here would be a lower limit. Combined with numerous recently sequenced natural strains (e.g., by the US Food and Drug Administration), the refined genomic mutation rate estimated here provides an opportunity to calculate N e of S. enterica. To this end, we retrieved raw reads of 210 natural strains with an intact MMR function, which originated from 46 states of the USA and covered all serovars used in this study, and mapped them to the S. Typhimurium LT2 chromosome (supplementary dataset S1: table S35, Supplementary Material online). The all-site (all sites: 4836523; SNPs: 364576) and the silent-site π s (4-fold degenerate sites: 683211; SNPs: 64658) of S. enterica were estimated to be 0.021 and 0.028, with an application of the latter leading to a N e estimate of 1.24 × 10 8 . One study, using genome assemblies of 400 S. enterica strains-different strains from ours-and the mutation rate estimated previously (Lind and Andersson 2008), reported an even higher N e of 4.03 × 10 8 (Lind and Andersson 2008;Bobay and Ochman 2018) (fig. 3A).
Therefore, the relatively large N e among bacteria facilitates effective natural selection for antimutators in this species (fig. 3A). The targets of such selection might include: the DNA mismatch repair system of S. enterica, which is more efficient than those in many other bacteria ; the DNA polymerases, such as UmuDC, which during stress response in S. enterica, has a higher fidelity than those of other bacteria when exposed to UV-induced DNA damage (Koch et al. 1996); other possible unknown antimutator genes, especially in the strains with extremely low mutation rates, such as S. Typhi PNG31 and CDC9228-77, in this study (table 2, supplementary dataset S1: tables S11, S15, Supplementary Material online).
N e of S. enterica estimated using the natural strains of multiple serovars in this study-1.24 × 10 8 -is much larger than the N e for S. Typhi: 2.3 × 10 5 to 1.0 × 10 6 in (Roumagnac et al. 2006). However, if only using π s of the 39 natural S. Typhi strains and their corresponding mutation rates in our dataset, we calculate N e to be 4.93 × 10 5 (π s = 9.47 × 10 −5 ), highly consistent with previous reports for S. Typhi, which is known to have diverged from other Salmonella lineages quite recently and represents one incipient species (Kidgell et al. 2002;Didelot et al. 2011) ( fig. 1, left). However, because of the short divergence time in this recently emergent lineage, these N e estimates will be downwardly biased, which should reach mutationdrift equilibrium, but have not yet been reached. Thus, the fact that the mutation rate of S. Typhi is at the low level found in other S. enterica serovars or other facultative pathogens ( fig. 3A) needs not be inconsistent with the drift-barrier hypothesis, although if true N e is indeed small, Pan et al. · https://doi.org/10.1093/molbev/msac081 MBE an eventual increase in the mutation rate would be expected. Such possible mutation-rate increase and reduced selection conferred by the small N e of S. Typhi could further elevate the genetic diversity of certain pathogenicity genes, and if so, this might lead to situations in which S. Typhi would be more toxic and challenging in future disease treatment.
Not in line with the low mutation rate, a low transcript-error rate of S. enterica was not observed compared with previously studied bacteria (Li and Lynch 2020). Given the fairly high error rate at the transcript level, one adaptive hypothesis argues that a high transcript-error rate increases the fitness of an isogenic population facing environmental challenges, such as the antibiotic threat (Meyerovich et al. 2010). However, our transcriptome-wide evaluation of transcript errors reveals that nonsynonymous errors are less frequent than synonymous errors, indicating the nonadaptive nature of transcript errors (P = 2.61 × 10 −13 , χ 2 test; supplementary dataset S1: table S36, Supplementary Material online). It remains to be seen whether this fairly high transcript-error rate has been driven by directional evolutionary forces or is simply a result of stochastic drift around the drift barrier. Either way, relatively high transcription-error rates may have relevance to clinical approaches to constraining the success of pathogens, inspiring possible killing strategies by magnifying the transcript-error rate, pushing it over the edge of lethal mutagenesis. To achieve a thorough evaluation of cellular stability at the genome, transcriptome, and proteome levels, future technical advancement in the accurate quantification of translation-error rate is required.
In contrast to the recent advances in the mutational features of nonpathogenic microbes (i.e., biosafety level 1 strains), the studies of human pathogens, especially obligate ones, have barely been explored by mutationaccumulation techniques combining deep whole-genome sequencing. This is mostly because of the technical barriers from limited access to biosafety level 2 or above labs or high costs for numerous lines requiring long-term culturing/ genome sequencing. Mutational patterns, particularly the mutation spectra, are known to be influenced by the genome architecture and environmental factors such as pH, antibiotics, etc. (Long, Kucukyildirim, et al. 2015;Strauss et al. 2017). Different from opportunistic human pathogens, most obligate ones constantly experience harsh challenges from the immune system, medicines, microbial competitors, etc., and usually have quite streamlined and minimized genomes. Theoretical studies have predicted the elevated mutation rates due to the small N e (Sung et al. 2012;Lynch et al. 2016), whereas genome-wide mutation spectra are mostly unknown, let alone, how host environments influence their mutagenesis and further affect the pathogenicity evolution. Therefore, systematic evaluation on the genomic mutations of diverse microbial pathogens, each with multiple strains-especially mutation rate in their natural habitats (Zhao et al. 2021)-is urgently needed for testing the generalized mutation-rate evolution hypotheses, understanding their long-term evolution and transmission, as well as guiding infectious-disease treatment and vaccine development at the DNA or RNA levels.

Strains and MA Experiments
In this study, 18 strains were ordered from the American Type Culture Collection (ATCC, VA, USA) and Salmonella Genetic Stock Center (SGSC, University of Calgary, Alberta, Canada); 15 of them were MMR-functional and the remaining three were MMR-knockout (table 2). All the biosafety level 2 procedures were performed under the protocol of 15-038, approved by the Institutional Biosafety Committee, Indiana University at Bloomington. We also sequenced the whole genomes of the ancestral lines of the 18 strains. About 48 and 16 MA lines were initiated for each of the MMR-functional and -dysfunctional strains, respectively; the 48 MA lines for each MMR-functional strain were transferred in two contiguous batches. All the MA lines were cultured on LB agar plates at 37°C, and repeatedly single-colony transferred, every day for nearly 3 months (table 2). For each strain, we calculated the colony forming units (CFU) using single colonies of five randomly chosen MA lines, every 3 weeks. log 2 (CFU) was then used as the number of cell divisions from a single cell growing to a colony. The grand mean of cell divisions in each colony was calculated for the MA lines of each strain after the MA transfer was finished. The total transfers multiplied by the grand mean were the total number of cell divisions that each MA line passed. After genome sequencing and preliminary analysis, we removed the MA lines that were crosscontaminated or if the depth of coverage was ,20× (table 2, supplementary dataset S1: tables S2-S19, S22, Supplementary Material online).

Calculation of Population Genetic Parameters
We downloaded a total of 211 natural strains of S. enterica, belonging to eight serotypes, identified by the Food and Spontaneous Mutations and Transcript Errors of S. enterica · https://doi.org/10.1093/molbev/msac081 MBE Drug Administration USA from the NCBI SRA database, which were collected from 46 states of the USA (supplementary dataset S1: table S32, Supplementary Material online). We used Unicycler (ver. 0.4.8) (Wick et al. 2017) to assemble the draft genomes of all the strains, then blasted their crucial MMR genes-mutS, mutL, mutH, and UvrD with the MMR gene sequences of S. Typhimurium LT2. One strain (SRR8950558) was then removed due to an incomplete mutL gene, as we required all strains to have intact reading frames of the MMR genes. We used BWA (ver. 0.7.12) (Li and Durbin 2009) to align the raw reads of the 210 natural strains to the chromosome of S. Typhimurium LT2 (NCBI accession number: NC_003197.1) and calculated π s , the average pair-wise genetic distance at 4-fold degenerate sites using the following formula: where π is the average number of nucleotide differences per site between two DNA sequences in all possible pairs of the strains, n is the number of strains. The effective population size (N e ) of S. enterica was calculated using the following formula, where μ is the mutation rate per site per cell division: To calculate dN/dS of the pathogenicity genes, we first retrieved the genes in Salmonella pathogenicity islands of S. Typhimurium LT2 (marT,misL,pipA,rhuM,SopE2,SseC,SseD,SseE,SseF,ssrB), as well as the homologs of other S. enterica strains with complete genomes available in the NCBI Genome database (supplementary dataset S1: tables S27, S28, Supplementary Material online). We then used ClustalW2 (ver. 2.1) (Larkin et al. 2007) to align the protein sequences, then used the pal2nal.pl script (Suyama et al. 2006) to match the aligned protein sequences with the corresponding DNA sequences in the genome and convert the amino acids to genetic codes. IQ-TREE (ver. 2.1.2) (Minh et al. 2020) was used to build the phylogenetic tree with the following parameters: -alrt 1000 -B 1000. The module codeml in PAML (ver 1.3.1) (Yang 2007)  MBE to construct consensus sequences. The quality score of each consensus base call was recalculated from the raw quality score of each repeat. A probability of the erroneous consensus call of 10 −7 or lower is required for the downstream analysis. Transcript errors were called if mismatches between consensus and reference bases were supported by ,1% of reads that were mapped to the corresponding loci. Transcript errors that could be explained by genetic variations among the multiple copies of genes were further excluded as false positives. The expected number of sites with co-occurring mutations and transcript errors in the whole genome are calculated by: where μ G is the pooled mutation rate per site per cell division of three MMR-dysfunctional strains, N gen denotes the total number of cell divisions over MA experiments of the three strains, μ T is the transcript-error rate per site of 15 MMR-functional strains, and n refers to the average expression level of one locus, G is the genome size.
To find out the association between expression levels and mutations/transcript errors, we retrieved the RNAseq data (SRR5980348-63) of S. Typhimurium LT2, which were all in vegetative growth, for expression level analysis. The reads were mapped to the chromosome sequence of S. Typhimurium LT2 using Hisat2 (ver. 2.1.0) (Kim et al. 2015), then sam files were converted to bam format using samtools-1.3.1. StringTie (ver. 2.1.5) (Kovaka et al. 2019) and Ballgown (ver. 2.5.3) (Frazee et al. 2014) were used to calculate the expression level of each gene. Finally, the expression level of each gene was calculated using the mean of the expression levels of the sample replicates.

Testing for False Negative Mutations
The average genomic mutation rate reported here is 1/6 of the previously reported for S. Typhimurium LT2 (Lind and Andersson 2008). Artifacts resulting from the reference genome used-specifically the MA lines of non-LT2 serovars, analysis methods, batch effects of culturing media, etc., might lead to such differences.
To test if such low mutation-rate results from false negatives caused by the reference genome used (chromosome of S. Typhimurium LT2) or the mutation analysis method applied, we chose six strains used in this study, with complete genomes reported (the NCBI GenBank assembly accession numbers for these six strains-S. Agona BAA-707: GCA_000483935.1, S. Bareilly 9115: GCA_000487355.1, S. Enteritidis LJH608: GCA_003418895.1, S. Paratyphi A 9150: GCA_000011885.1, S. Typhi CT18, Ty2: GCA_000195995.1, GCA_000007545.1), and re-analyzed mutations with the GATK pipelines used in the preceding analysis, and found almost no difference (supplementary dataset S1: table S37, Supplementary Material online). The application of another consensus mutation-analysis-pipeline with the genome of S. Typhimurium LT2 as reference, which uses fairly simple and loose filters, also led to highly similar mutations (supplementary dataset S1: table S37, Supplementary Material online). To further test the false-negative possibility resulting from the usage of the LT2 chromosome as reference, instead of the ancestors (even though on average .90% of the MA lines' genomes were covered with highquality reads), we de novo assembled draft genomes of the 15 wild-type strains (table 2), using raw reads of the MA ancestors and Unicycler (ver. 0.4.8) with default parameters (Wick et al. 2017) (NCBI SRA, BioProject No.: PRJNA 397616). We then ran all the mutation and transcript-error analyses again with the ancestral draft genomes on the MA lines or the Cir-seq samples, and found no significant differences from those using the LT2 chromosome sequence as a reference genome (supplementary dataset S1: tables S38 and S39, Supplementary Material online), adding further support to the low false negative rate of this study.
We also query if batch effects of the culturing media caused mutation rate underestimates. For each of the 15 MMR-functional strains, the MA lines were transferred in two consecutive batches with the same procedures, and the two batches did not show any significant difference in the mutation rate or spectrum, even if they were done at different time points (Paired t-test: P = 0.74; supplementary figs. S4 and S5, Supplementary Material online). Thus, the mutation rate of S. enterica is accurate and not technical artifacts.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online. J.N., M.L. wrote the manuscript. All authors read and approved the final submitted version.

Data Availability
All sequencing data in this research are available at NCBI SRA with the BioProject Number of PRJNA397616.